kernelized multi-armed bandit
On Kernelized Multi-Armed Bandits with Constraints
We study a stochastic bandit problem with a general unknown reward function and a general unknown constraint function. Both functions can be non-linear (even non-convex) and are assumed to lie in a reproducing kernel Hilbert space (RKHS) with a bounded norm. In contrast to safety-type hard constraints studied in prior works, we consider soft constraints that may be violated in any round as long as the cumulative violations are small, which is motivated by various practical applications. Our ultimate goal is to study how to utilize the nature of soft constraints to attain a finer complexity-regret-constraint trade-off in the kernelized bandit setting. To this end, leveraging primal-dual optimization, we propose a general framework for both algorithm design and performance analysis.
Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards
Mussi, Marco, Drago, Simone, Metelli, Alberto Maria
We consider Kernelized Bandits (KBs) to optimize a function $f : \mathcal{X} \rightarrow [0,1]$ belonging to the Reproducing Kernel Hilbert Space (RKHS) $\mathcal{H}_k$. Mainstream works on kernelized bandits focus on a subgaussian noise model in which observations of the form $f(\mathbf{x}_t)+\epsilon_t$, being $\epsilon_t$ a subgaussian noise, are available (Chowdhury and Gopalan, 2017). Differently, we focus on the case in which we observe realizations $y_t \sim \text{Ber}(f(\mathbf{x}_t))$ sampled from a Bernoulli distribution with parameter $f(\mathbf{x}_t)$. While the Bernoulli model has been investigated successfully in multi-armed bandits (Garivier and Capp\'e, 2011), logistic bandits (Faury et al., 2022), bandits in metric spaces (Magureanu et al., 2014), it remains an open question whether tight results can be obtained for KBs. This paper aims to draw the attention of the online learning community to this open problem.